# installing packages
#install.packages("tidyverse")
#install.packages("readxl")
#install.packages("writexl")
#install.packages("here")
#install.packages("writexl")
# another option for installing packages
#install.packages(c("tidyverse", "readxl", "writexl", "here", "writexl"))
library(tidyverse)
library(readxl)
library(writexl)
library(here)Why am I here?
Welcome to December! 2024 is coming to an end, and I cannot wait to see what 2025 has in store for us! As a reminder, each lesson is designed as a 5 - 10 minute virtual session conducted for EnCompass staff to expand their skills with data, and the means of learning is the R programming language. When I started learning R, I dove into Datacamp and played with the Gapminder dataset a lot. Then, one day I was asked to look at a dataset that I received over email. No problem, I thought. . Hmmmm…This data is not magically already here…How do I put this in my RStudio environment? I couldn’t print it and feed it to it…I couldn’t scan it….Could I drop the file in?….I was lost. I didn’t understand how to import a file. I walked down the hall and asked a friend and 30 seconds later life was good. I don’t want you to go through this so consider this post your friend down the hall. Enjoy!
Learning objectives
For this session, the learning objective is to:
Import data from a .csv file and a .xlsx file
Write data to a .xlsx file
Wait, how does the data get into the Rstudio thingy?
Most data that is used in the workplace comes in the form of a .csv file or a Excel file. If you’re lucky, maybe you can import data directly from your company’s database or you download reliable data from the internet using an API. The reality is that you’ll probably get emails with Excel files attached so it’s important to know how to access these.
Install and load packages
If you already have the packages below installed, then you don’t need to install them again. I #’d the install.packages() lines so that the code doesn’t run for me. You can delete the #’s if you need to install the packages.
Then, activate the packages with the library() so they are active in your session.
Read in files
Previously our data was included in the gapminder package. You can find lots of training data sets in packages, and this is really useful, but it’s not how you will get most of your data in the workplace. So, let’s get to it.
CSV file
# read the data in and assign it to a variable
df_csv <- read_csv(here::here("posts/read file/testdata2.csv"))
#check the data
dplyr::glimpse(df_csv)Rows: 20
Columns: 9
$ respondentid <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ location <chr> "Lubelskie", "Lubelskie", "Lubelskie", "Lubelskie", "Lub…
$ type_visit <chr> "in person", "in person", "in person", "in person", "in …
$ gender <chr> "M", "F", "M", "F", "F", "F", "M", "M", "M", "F", "F", "…
$ v1 <chr> "yes", "no", "yes", "no", "yes", "no", "yes", "yes", "no…
$ v2 <dbl> 1, 5, 6, 2, 7, 1, 3, 3, 5, 7, 8, 9, 4, 2, 3, 1, 6, 4, 2,…
$ text_response <chr> "I liked the service.", "I did not like the service.", "…
$ lat <dbl> 51.01707, 51.27283, 50.92471, 50.35257, 51.50020, 51.745…
$ long <dbl> 23.12068, 23.53005, 23.57209, 23.03679, 23.24230, 21.981…
Success!
Excel file
# read the data in and assign it to a variable
df_xlsx <- read_xlsx(here::here("posts/read file/testdata2.xlsx"))
#check the data
dplyr::glimpse(df_xlsx)Rows: 20
Columns: 9
$ respondentid <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ location <chr> "Lubelskie", "Lubelskie", "Lubelskie", "Lubelskie", "Lub…
$ type_visit <chr> "in person", "in person", "in person", "in person", "in …
$ gender <chr> "M", "F", "M", "F", "F", "F", "M", "M", "M", "F", "F", "…
$ v1 <chr> "yes", "no", "yes", "no", "yes", "no", "yes", "yes", "no…
$ v2 <dbl> 1, 5, 6, 2, 7, 1, 3, 3, 5, 7, 8, 9, 4, 2, 3, 1, 6, 4, 2,…
$ text_response <chr> "I liked the service.", "I did not like the service.", "…
$ lat <dbl> 51.01707, 51.27283, 50.92471, 50.35257, 51.50020, 51.745…
$ long <dbl> 23.12068, 23.53005, 23.57209, 23.03679, 23.24230, 21.981…
Success!
Mutate the dataset
We can see that both datasets are identical with 20 rows (observations) across 9 columns. So, pick either one and we’ll add a variable called income using the mutate() function. This will add a column, and we will save it under as a new object that we will write back to an Excel file so we can email to our colleague.
# add a new variable to the dataset
df_add <- df_xlsx |> #pipe df to the mutate function
dplyr::mutate(income = sample(1000:7500, 20))
#view first 6 rows
head(df_add)# A tibble: 6 × 10
respondentid location type_visit gender v1 v2 text_response lat long
<dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 1 Lubelskie in person M yes 1 I liked the … 51.0 23.1
2 2 Lubelskie in person F no 5 I did not li… 51.3 23.5
3 3 Lubelskie in person M yes 6 The service … 50.9 23.6
4 4 Lubelskie in person F no 2 Amazing 50.4 23.0
5 5 Lubelskie in person F yes 7 Poor 51.5 23.2
6 6 Lubelskie remote F no 1 Some great t… 51.7 22.0
# ℹ 1 more variable: income <int>
#view last 6 rows
tail(df_add)# A tibble: 6 × 10
respondentid location type_visit gender v1 v2 text_response lat long
<dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 15 Opolskie remote M no 3 I would have … 50.5 17.6
2 16 Opolskie remote F no 1 Yes, the team… 50.8 18.4
3 17 Opolskie remote F no 6 I didn't appr… 50.5 17.7
4 18 Opolskie telephone F yes 4 More support … 50.8 17.3
5 19 Opolskie telephone M no 2 I was told th… 51.0 18.5
6 20 Opolskie telephone M yes 2 I forgot my I… 50.7 17.3
# ℹ 1 more variable: income <int>
We’re using the basic manipulation skills that we saw with the Gapminder dataset in the previous lessons.
Write it to Excel or CSV
Here all we need to do is specify the object and provide it a path where we want the file to be saved. You should write it to somewhere in your documents folder to avoid any issues with One Drive or Google Drive. N.B. with write_csv() path becomes file.
write_xlsx(x = df_add, path = here::here("posts/read file/test_out.xlsx"))
write_csv(x = df_add, file = here::here("posts/read file/test_out.csv"))Now you can email this updated .xlsx/.csv file to your colleague. :)
Have fun!
Now it’s your turn practice! Below is a fully functioning code editor with starting code in place. Feel free to experiment with different grouping variables in the group_by() call or to adjust the summary statistic in summarize(). Then, have fun with the plot!